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Final  Performance  Report  to  Dr.  Darema 

Anthony  Vodacek  and  John  P.  Kerekes,  RIT  Center  for  Imaging  Science 
Matthew  J.  Hoffman,  RIT  School  of  Mathematical  Sciences 

Summary  -  Simulations  of  a  dynamic  data  driven  sensor  together  with  an  adaptive  sampling 
strategy  were  created  using  Dynamic  Data  Driven  Application  Systems  (DDDAS)  principles  and 
demonstrated  to  reduce  the  amount  of  data  required  to  perform  robust  vehicle  tracking.  By 
employing  an  adaptive  sampling  strategy  (controlling  the  pixel  sampling  based  on  analysis  of 
prior  data)  for  a  simulated  multimodal  sensor  for  which  pixels  can  be  individually  addressed, 
only  about  10%  of  the  pixels  in  a  scene  were  required  to  collect  spectral  data  to  perform  feature 
matching.  Further,  the  method  was  shown  to  require  as  little  as  approximately  1.5%  of  the  data 
compared  to  the  case  of  a  full  hyperspectral  sensor.  The  method  employs  a  Gaussian  Sum  Filter 
for  the  vehicle-tracking  model  and  an  adaptive  forecasting  strategy  to  morph  the  Gaussian  Sum 
Filter  based  on  the  extracted  context  from  OpenStreetMap  to  improve  tracking  accuracy. 
Eliminating  target  spectra  contamination  with  the  background  by  sampling  the  background  at 
forecasted  vehicle  locations  was  shown  to  improve  tracking  by  improving  feature  matching 
results  leading  to  better  association  of  vehicles  with  specific  tracks. 

Motivation  -  The  tracking  of  ground  objects  via  electro-optical  imaging  from  a  remote  platform 
is  a  common  Air  Force  activity.  This  type  of  surveillance  is  often  controlled  by  an  operator  who 
devotes  attention  to  a  specific  previously  identified  object,  with  little  or  no  opportunity  to 
understand  the  wider  background  context  in  which  the  object  is  maneuvering.  In  contrast, 
spectral  cameras  and  algorithms  have  been  developed  for  wide-area  surveillance  and  tracking  of 
multiple  objects  autonomously.  However,  these  systems  generate  large  data  volumes  and  often 
track  any  moving  object,  including  those  that  may  be  of  irrelevant.  Further,  while  these  systems 
may  reacquire  object  tracking  after  moving  behind  a  tall  obscuration,  the  system  may  not 
associate  the  reacquired  object  as  being  the  same  object  that  was  previously  tracked  and  then 
obscured. 

Objective  -  The  objective  was  to  prototype  methods  for  a  vehicle  tracking  system  via  airborne 
imaging  that  would  overcome  the  limitations  of  previous  object  tracking  methods  by  applying 
Dynamic  Data  Driven  Applications  Systems  principles.  By  implementing  dynamic  interactions 
between  tracking  models  and  data  information  methods  appropriate  for  multi-modal  adaptive 
sensors,  the  system  would  allow  an  operator  to  mark  multiple  vehicles  for  tracking  in  an  image 
and  then  tracking  of  those  specific  objects  would  proceed  autonomously  while  maintaining 
object  association  despite  obscurations  and  a  variety  of  vehicle  maneuvers. 

Methods  and  Outcomes  -  True  to  the  principles  of  DDDAS,  the  methods  created  for  the  system 
all  adjust  the  system  to  the  content  of  the  image  data  being  collected  (Figure  1).  The  work 
focused  on  three  different  components  where  DDDAS  principles  were  applied.  One,  the  sensor 
considered  is  itself  multi-modal  and  controllable  at  the  pixel  level,  thus  forward-looking 
sampling  strategies  were  demonstrated  based  on  analysis  of  the  current  image  collection.  Two, 
the  background  to  the  moving  vehicle  naturally  changes  and  the  system  continuously  monitors 
and  updates  the  changing  background  to  improve  separation  of  the  vehicle  from  the  background 
during  feature  matching.  Three,  logical  decisions  on  future  vehicle  movements  based  on  the 
geometry  of  the  road  network  derived  from  OpenStreetMap  are  used  to  morph  the  Gaussian  Sum 
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Filter  to  improve  efficiency  of  tracking. 


Figure  1.  Flow  chart  of  the  vehicle  tracking  system. 

In  parallel  to  the  DDDAS  development  a  computationally  simple  method  for  assessing 
spectral  similarity  that  could  be  applied  in  feature  matching  was  demonstrated.  Finally,  by  using 
the  Digital  Imaging  and  Remote  Sensing  Image  Generation  (DIRSIG)  model  to  create 
surveillance  image  simulations,  the  approach  was  developed  without  expensive  and  time- 
consuming  fieldwork,  adaptive  multimodal  sensors  that  exist  only  in  design  studies  were  tested, 
and  complex  obscuration  and  viewing  scenarios  were  possible.  These  methods  are  summarized 
below. 

Adaptive  Sampling 

To  fully  exploit  the  utility  of  a  multimodal  adaptive  sensor  requires  implementing  sampling 
strategies.  The  goal  was  to  pick  pixels  that  most  likely  contain  target  values  and  not  background 
values  in  order  to  associate  the  vehicle  detection  with  the  correct  track.  This  strategy  addressed 
the  orientation  of  the  vehicle  relative  to  the  columns  and  rows  of  the  image. 

The  orientation  sampling  strategy  created  was  based  on  the  orientation  angle  of  a  target  in 
the  image  was  made  relative  to  the  determination  of  two  thresholds.  If  the  orientation  angle  of 
the  target  was  found  to  be  lower  than  both  of  the  thresholds,  the  horizontal  sampling  (image 
rows)  method  was  assigned  since  the  target  was  likely  to  travel  horizontally  in  the  image.  If  the 
estimated  orientation  angle  is  in  between  the  thresholds,  then  diagonal  sampling  (diagonal  of 
image  rows  and  columns)  was  employed  to  provide  a  higher  number  of  possible  target  pixels.  If 
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the  estimated  orientation  angle  is  higher  than  both  of  the  thresholds  (the  target  is  in  vertical 
shape),  the  horizontal  sampling  method  is  assigned  again  since  with  rotation  of  the  image 
horizontal  sampling  becomes  identical  to  vertical  sampling. 

Feature  matching  and  background  monitoring  and  elimination 

A  method  to  continuously  assess  and  update  the  background  to  the  vehicle  was  created 
since  it  is  inevitable  that  spectral  measurements  of  the  background  will  be  made  in  addition  to 
the  target  measurements.  The  unknown  presence  of  background  data  in  an  assumed  vehicle 
spectrum  can  yield  the  association  of  a  detected  object  to  a  wrong  track.  A  background 
elimination  method  was  designed  to  remove  background  pixel  spectra  from  the  spectral 
measurements  of  the  vehicles.  A  forecast  of  the  target  of  interest  location  for  the  future  time  step 
of  the  vehicle  movement  model  can  be  assumed  to  be  background  and  updated  continuously  as 
the  background  changes.  During  adaptive  sampling,  as  previously  described,  simple  comparison 
between  target  and  background  spectra  is  performed  using  the  Spectral  Angle  Method  (SAM)  to 
remove  background  pixels  from  the  vehicle  spectra,  which  resulted  in  the  improvement  in  the 
association  of  the  vehicle  with  its  track. 

Another  use  of  feature  matching  in  complex  environments  is  that  tracking  will  often  be  lost 
when  vehicles  move  under  trees,  behind  buildings  or  near  other  moving  targets,  etc.  In  these 
cases,  it  is  important  to  re-establish  the  track  when  a  new  detection  occurs,  as  opposed  to  treating 
the  re-detected  vehicle  as  a  new  object.  This  is  accomplished  through  feature  matching,  where 
the  spectral  features  of  the  new  target  are  compared  with  those  of  the  past  targets.  The 
comparison  is  performed  using  spectra  data  collected  at  different  wavelengths  by  a  sensor.  In  a 
real  system,  the  sensor  must  be  tasked  to  take  spectral  data  without  any  a  priori  knowledge  of 
where  detections  will  occur.  Therefore,  sampling  is  performed  by  taking  a  subset  of  the  pixels 
from  where  the  forecasting  model  predicts  the  target  will  be.  Using  the  sampled  pixels,  a 
spectrum  is  formed  for  each  prediction  component.  The  spectrum  (the  prediction  component) 
that  matches  best  with  an  existing  track  is  used  to  perform  association.  After  detecting  moving 
objects  in  the  new  frame,  the  closest  detected  object  to  the  prediction  component,  which  matches 
best  with  an  existing  track,  is  associated  to  the  corresponding  existing  track.  In  other  words,  the 
new  object  is  regarded  as  a  re-detection  of  the  old  target.  If  no  preexisting  target  is  matched,  then 
a  new  track  is  initiated.  The  Spectral  Angle  Mapper  (SAM)  spectral  measure  was  used  to 
compare  spectra.  It  computes  the  similarity  between  two  spectra  by  measuring  the  angular 
difference  of  spectral  direction.  It  is  insensitive  to  the  magnitude  of  brightness  since  it  takes  only 
the  vector  direction  into  account.  This  research  project  included  the  development  of  a  new 
spectral  similarity  measure  that  does  include  the  magnitude  as  well  as  the  spectral  direction. 

A  new  image  spectral  similarity  algorithm 

While  SAM  was  used  to  test  spectral  similarity  in  the  prototype  development,  a  more 
powerful  yet  simple  new  spectral  similarity  measure  was  developed  based  on  the  geometric 
characteristics  of  the  Mahalanobis  distance  so  as  to  incorporate  both  spectral  direction  and 
spectral  magnitude.  With  a  minimum  of  a  human  operator  input  to  define  representative  pixels, 
the  measure  was  tested  experimentally  to  demonstrate  through  the  analysis  of  ROC  curves  the 
potential  advantages  of  the  novel  distance  measure  when  applied  to  the  identification  of 
materials  in  urban  images  such  as  vehicles.  Further  details  on  this  algorithm  can  be  found  in  the 
publication  by  Chen,  Sun,  and  Vodacek  listed  in  the  section  below  listing  publications  derived 
from  this  research. 
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Gaussian  Sum  Filter  that  adjusts  to  vehicle  movement  and  intersections 

Estimating  the  state  of  a  target  vehicle  using  data  collected  by  an  airborne  sensor  can  be 
challenging  due  to  possible  non-Gaussian  distribution  of  target  movement.  A  nonlinear  filter  can 
better  approximate  a  non-Gaussian  distribution  and  evolve  the  corresponding  uncertainty.  The 
previously  developed  Gaussian  Sum  Filter  (GSF),  a  nonlinear  filter,  was  employed  in  this  study. 
It  represents  a  non-Gaussian  distribution  by  a  finite  mixture  of  Gaussian  distributions.  The  mean 
and  covariance  of  these  density  kernels  are  updated  using  the  Extended  Kalman  filter  (EKF)  was 
used  in  this  study.  There  are  two  advantages  of  the  GSF  over  a  single  EKF.  A  single  EKF 
represents  a  nonlinear  model  by  linearizing  it.  Meanwhile,  the  GSF  represents  a  non-Gaussian 
problem  by  a  mixture  of  Gaussian  distributions.  Besides,  GSF,  a  mixture  of  different  EKF  banks, 
were  used  to  implement  a  multiple  forecasting  model  set  strategy  to  predict  the  target  vehicle 
movement.  With  a  single  EKF,  we  would  only  be  able  to  employ  one  model  at  each  time  step.  In 
this  case,  the  single  forecasting  model  should  be  well  defined  to  predict  the  target  movement. 

The  weights  of  the  density  kernels  are  kept  constant  while  propagating  the  uncertainty  and 
updated  in  the  presence  of  observation. 

Tests  were  made  to  find  an  appropriate  number  of  Gaussian  components  to  perform  robust 
tracking  in  challenging  scenarios.  More  components  can  improve  tracking  but  also  bring 
undesired  complexity.  Different  numbers  of  components  were  tested  to  represent  a  target  to 
perform  a  trade  study  of  complexity  versus  computational  complexity.  For  example,  in  a  design 
scenario  consisting  of  a  T-type  road  intersection,  where  going  left  and  right  are  the  only 
possibilities  a  target  can  take  6  components  is  the  optimum  number  of  components  to  cover  a 
possible  path  (going  left,  right,  straight)  while  keeping  the  complexity  at  a  desired  level.  As  a 
result  12  components  (6  for  left  turn  and  6  for  right  turn)  are  placed  in  the  vicinity  of  the 
observation  while  one  component  is  placed  on  the  observation  initially  for  a  total  of  13 
components  while  maintaining  the  required  computational  speed. 

Matching  the  image  and  the  OpenSteetMap  road  network 

Despite  the  effectiveness  of  the  background  elimination  approach  described  above,  there 
were  challenges  in  maintaining  feature  matching  at  intersections  where  nonlinearities  occur.  The 
main  reason  behind  this  was  the  difficulty  of  forecasting  a  nonlinear  and  complex  movement.  To 
improve  forecasting  and  analysis  performance,  additional  context  was  added  to  the  system  by 
using  prior  knowledge  of  the  road  network.  In  a  DDDAS  sense  the  context  is  changed  as  the 
vehicle  moves  through  the  environment.  Vehicles  are  more  likely  to  follow  road  networks,  so 
following  identification  of  a  car  localizing  its  placement  on  a  known  road  reduces  the  uncertainty 
for  its  next  location.  In  other  words,  using  the  extracted  additional  road  network  context  allowed 
pre-adjusting  of  the  forecasting  multiple  model  set  with  the  result  of  better  target  tracking.  For 
instance,  probable  paths  a  target  may  take  were  based  on  the  type  of  an  intersection  (T-type,  plus 
type,  etc.)  the  vehicle  was  approaching. 

The  extraction  of  the  road  network  in  the  image  involved  three  aspects:  image-based  road 
network  extraction,  vector  road  to  raster  imagery  conflation,  and  a  seamless  integration  of  both 
into  a  unified  framework.  The  road  extraction  approach  was  fairly  generic  since  the 
OpenStreetMap  (OSM)  vector  road  data  is  globally  and  readily  freely  available.  The  OSM 
quality  improves  and  coverage  grows  over  time,  which  made  it  a  valuable  source  of  prior  data 
for  image-based  road  extraction.  Due  to  the  persistent  mis-registration  between  image  and  map 
data,  map  conflation  (making  maps  match)  is  carried  out  first  to  adjust  the  OSM  road  vectors  to 
align  with  road  centerlines  in  the  image.  Junction  templates  derived  from  rasterized  OSM  road 
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segments  are  then  matched  with  image-derived  binary  road  masks  and  curvilinear  response 
images  to  conflate  junction  points.  The  non-junction  point  matching  approach  effectively 
conflates  the  vector  road  network  based  on  pre-corrected  road  segments  and  image  curvilinear 
structures,  within  which  width  and  orientation  estimations  are  also  embedded  and  can  be 
obtained  to  recover  the  piecewise  width  of  road  segments.  A  road  label  mask  is  finally  created 
with  complete  road  knowledge  -  centerline,  width,  connectivity,  and  topology.  The  approach  was 
tested  on  some  large  and  diverse  image  data  sets  and  was  verified  for  its  effectiveness  and 
robustness  and  was  found  to  achieve  a  minimum  of  80%  conflation  correctness  and  70%  road 
extraction  accuracy  on  some  extremely  challenging  scenes  with  dense  and  irregular  road 
networks  with  building  shadows. 

To  identify  roads  and  intersections,  the  OSM  road  network  data  were  injected  into  the 
tracking  system.  The  OSM  source  data  are  standardized  and  rasterized,  but  lingering 
misregistration  with  image  data  due  to  image  distortions,  topographical  change,  inaccurate  map 
surveys,  GPS  errors,  etc.  A  method  to  register  the  image  and  map  data  was  developed.  This 
process  started  with  the  identification  of  intersections,  end  points,  and  points  with  high  curvature 
in  the  OSM  data  to  form  templates.  Using  the  map  coordinates  as  a  first  guess,  a  search  of  the 
extracted  image  features  in  the  neighborhood  of  the  first  guess  was  used  to  match  the  templates. 
This  allowed  finding  the  accurate  positions  of  intersections  and  curvy  roads  on  the  image.  To 
account  for  different  type  of  roads,  different  width  values  are  tested  during  template  formation. 
This  process  identifies  important  intersections  in  the  image  that  the  tracked  target  might 
approach. 

Once  the  road  network  is  identified  in  a  given  scenario,  all  possible  paths  a  target  can 
travel  in  a  particular  time  step  are  available  for  approximating  the  prior  and  posterior  probability 
distribution  function  of  the  target.  A  recently  published  multiple  model  set  tracking  system  called 
an  Interactive  Multiple  Modal  (IMM)  filter  was  adopted  for  the  tracking  system.  In  essence, 
specific  models  are  adaptively  avoided  using  the  intersection  data  in  certain  situations.  For 
example,  in  a  T-type  intersection,  the  coordinated  turn  models  (CTM)  are  applied  while  the 
constant  velocity  (CV)  model  is  avoided  while  in  the  case  of  a  plus  type  intersection  both  the  CV 
and  CTM  models  are  allowed.  In  the  DDDAS  sense,  the  models  are  adjusted  as  the  vehicle 
moves  through  the  environment. 

Leveraging  DIRSIG  simulations  to  create  the  system  prototype 

To  develop  and  test  the  system  in  a  controlled  environment  that  allows  us  a  knowable 
ground  truth,  we  use  synthetic  imagery  generated  by  the  Digital  Imaging  and  Remote  Sensing 
Image  Generation  (DIRSIG)  model.  DIRSIG  is  a  first-principles  image  generation  model  that 
computes  time  and  material  dependent  surface  temperature  values,  incorporates  atmospheric 
contributions  using  MODTRAN,9  and  predicts  bi-directional  reflectance  functions  to  render 
realistic  image  sets.  In  addition,  the  Simulation  of  Urban  Mobility  (SUMO)  traffic  simulator  has 
been  integrated  with  DIRSIG  to  produce  dynamic  imagery  for  tracking  scenarios.  SUMO  has  the 
capability  to  simulate  both  vehicular  and  pedestrian  movement,  but  only  vehicular  traffic  is 
considered  for  this  study.  Different  paint  models  are  used  for  different  vehicles.  The  motivation 
for  using  synthetic  data  is  that,  since  we  know  the  true  positions  and  characteristics  in  a  synthetic 
image,  we  can  accurately  compute  performance  metrics  for  the  tracking  system.  Furthermore, 
multiple  scenarios  and  sampling  strategies  within  those  scenarios  can  also  be  carried  out  without 
running  multiple  experiments.  The  scenario  used  in  this  paper  comes  from  DIRSIG  Megascene  I, 
which  is  built  to  resemble  part  of  Rochester,  NY,  USA.  The  simulation  uses  hyperspectral 
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imaging  from  a  fixed  aerial  platform  assuming  a  static  sensor  mount.  To  test  the  algorithm  we 
design  a  30-second  video  sequence  formed  by  these  images  featuring  vehicles  motion  variations 
via  SUMO.  The  spectral  range  is  400  to  1000  nm  with  a  spectral  resolution  of  5  nm.  Thus, 
generated  hyperspectral  images  have  121  wavelength  bands.  Since  the  base  image  was  spectral 
any  sensor  adaptive  in  vis-NIR  wavelengths  could  be  simulated. 

Conclusion  -  A  dynamic  data  driven  application  system  prototype  applicable  to  an  adaptive 
multimodal  sensor  together  with  a  proposed  adaptive  sampling  strategy  was  demonstrated  to 
greatly  reduce  the  amount  of  data  required  to  perform  target  identification.  Without  an  adaptive 
multimodal  sensor,  the  default  would  be  to  collect  a  full  spectral  image.  By  using  an  adaptive 
sampling  strategy,  while  -10%  of  the  image  pixels  were  selected  to  collect  spectral  data  to 
perform  feature  matching  in  the  prototype  examined  here.  The  data  volume  was  -1.5%  of  the 
data  used  by  a  hyperspectral  sensor.  Target  identification  was  shown  to  be  improved  using  a 
background  data  elimination  method  was  designed  to  remove  redundant  spectral  data  and  an 
adaptive  forecasting  strategy  based  on  the  extracted  context  from  OpenStreetMap  improved 
tracking  accuracy  and  target  identification.  In  summary,  the  vehicle  tracking  adjusts  to  the 
vehicle  movement,  the  background  environment,  and  the  road  network  as  derived  from  the 
imagery  used  in  the  tracking. 

Theses  to  be  published 

The  major  part  of  the  actual  vehicle  tracking  code  including  the  implementation  of  the  GSF 
and  the  adaptive  sensing  strategies  using  DDDAS  principles  is  the  work  of  Ph.D.  student  Burak 
Uzkent.  Mr.  Uzkent  is  expected  to  defend  his  dissertation  in  winter  2014-15.  Tentative 
dissertation  title:  Persistent  Ground  Target  Tracking  using  Dynamic  Data  Driven  Adaptive 
Optical  Sensor. 

The  spectral  similarity  measure  and  the  OpenStreetMap  technique  is  the  work  of  Ph.D. 
student  Bin  Chen.  Mr.  Chen  is  expected  to  defend  his  dissertation  in  August  2014.  Tentative 
dissertation  title:  Scene  Content  Understanding  of  High-resolution  Remote  Sensing  Imagery. 

These  two  dissertations,  like  all  RIT  theses  and  dissertations  will  be  made  available  online 
at  the  RIT  Digital  Media  Library  in  pdf  format  as  a  degree  requirement  <https://ritdml.rit.edu/>. 
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Appendix  A  -  Pseudo  code  for  the  image  and  OpenStreetMap  conflation  operation 


The  pseudo  code  for  the  image  and  OpenStreetMap  conflation  operation  is  listed  below.  For  a 
detailed  description  of  the  algorithm,  please  refer  to  the  IGARSS  2014  paper:  Bin  Chen,  Weihua 
Sun,  Anthony  Vodacek,  “Improving  Image-Based  Characterization  of  Road  Intersections, 
Widths,  and  Connectivity  by  Leveraging  OpenStreetMap  Vector  Map”,  2014  IEEE  International 
Geoscience  and  Remote  Sensing  Symposium  (IGARSS),  2014. 

Given  a  geo-referenced  multispectral  image  MSI 
//  Generate  Binary  Road  Mask  BRM 

Foreach  reference  spectrum  REF  representing  road-like  pixels 
Foreach  pixel  X  in  MSI 

SSM  =  Spectral  similarity  measurement  ( X ,  REF) 

Threshold  SSM  ->  binary  SSM  (BSM) 

End 

BRM  =  OR (BRM,  BSM) 

End 

//  Generate  Curvilinear  Response  Image  CRI 
Foreach  filter  width  w 

Foreach  filter  orientation  theta 

Create  filter  bank  FB(w, theta) 

CRI ( w,  theta)  =  conv ( FB(w, theta) ,  MSI) 

End 

End 

//  Generate  Vector  Road  Network  VEC 
Import  ShapeFile  from  OpenStreetMap 
Standardize  ShapeFile  ->  VEC 

//  Road  Junction  Matching 
Foreach  road  junction  JCT 

Foreach  junction  branch  width  option 

Create  junction  template  TEMP  from  localized  VEC 
Generate  Correlation  Map  (CM) 

CM1  =  conv(TEMP,  BRM) 

CM2  =  conv(TEMP,  CRI) 

End 

Max(CMl,  CM2)  ->  (Max  Location,  Corresponding  Filter  Bank) 

End 

//  Pre-Correction  of  Road  Network 

Foreach  junction  point  pair  within  the  same  SEG 

2D_Intepolation(Max  Location)  ->  Pre-corrected  SEG 
End 

//  Non-Junction  Point  Matching 
Foreach  pre-corrected  SEG 

Transverse_Search(SEG)  ->  (Conflated  SEG,  SEG_width) 

End 

//  Road  Pixel  Extraction 
Foreach  SEG 

Expand  SEG  with  width  SEG_width 

END 
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